Weakly-supervised Temporal Action Localization by Uncertainty Modeling

نویسندگان

چکیده

Weakly-supervised temporal action localization aims to learn detecting intervals of classes with only video-level labels. To this end, it is crucial separate frames from the background (i.e., not belonging any classes). In paper, we present a new perspective on where they are modeled as out-of-distribution samples regarding their inconsistency. Then, can be detected by estimating probability each frame being out-of-distribution, known uncertainty, but infeasible directly uncertainty without frame-level realize learning in weakly-supervised setting, leverage multiple instance formulation. Moreover, further introduce entropy loss better discriminate encouraging in-distribution (action) probabilities uniformly distributed over all classes. Experimental results show that our modeling effective at alleviating interference and brings large performance gain bells whistles. We demonstrate model significantly outperforms state-of-the-art methods benchmarks, THUMOS'14 ActivityNet (1.2 & 1.3). Our code available https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video usin...

متن کامل

Towards Weakly-Supervised Action Localization

This paper presents a novel approach for weakly-supervised action localization, i.e., that does not require per-frame spatial annotations for training. We first introduce an effective method for extracting human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Our tube extraction leverages the large amount of annotated humans available today and outper...

متن کامل

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently e...

متن کامل

Action Recognition by Weakly-Supervised Discriminative Region Localization

We present a novel probabilistic model for recognizing actions by identifying and extracting information from discriminative regions in videos. The model is trained in a weakly-supervised manner: training videos are annotated only with training label without any action location information within the video. Additionally, we eliminate the need for any pre-processing measures to help shortlist ca...

متن کامل

Weakly Supervised Action Detection

Detection of human action in videos has many applications such as video surveillance and content based video retrieval. Actions can be considered as spatio-temporal objects corresponding to spatio-temporal volumes in a video. The problem of action detection can thus be solved similarly to object detection in 2D images [3] where typically an object classifier is trained using positive and negati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i3.16280